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We present new theoretical and empirical results on the probability distribu- 
tions of species persistence times in natural ecosystems. Persistence times, 
^ defined as the timespans occurring between species' colonization and local 

extinction in a given geographic region, are empirically estimated from local 
^ observations of species' presence/absence. A connected sampling problem is 

(N presented, generalized and solved analytically. Species persistence is shown 

22 to provide a direct connection with key spatial macroecological patterns like 

■^ species-area and endemics-area relationships. Our empirical analysis per- 

^ tains to two different ecosystems and taxa: a herbaceous plant community 

O and a estuarine fish database. Despite the substantial differences in ecologi- 

,__! cal interactions and spatial scales, we confirm earlier evidence on the general 

Kj^ properties of the scaling of persistence times, including the predicted effects 

of the structure of the spatial interaction network. The framework tested 
here allows to investigate directly nature and extent of spatial effects in 
the context of ecosystem dynamics. The notable coherence between spatial 
and temporal macroecological patterns, theoretically derived and empirically 
verified, is suggested to underlie general features of the dynamic evolution of 
ecosystems. 
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1. Introduction 



Extinction rates, pointedly from habitat loss, have been defined as a cru- 



cial conservation problem of this century (Raup and Sepkoski, 1984 Wilcox 



and Murphy 1985; Pimm et al. 1988 Fahrig 1997 Ferraz et al. 2003). 



Direct and reliable estimation of true extinction rates remains problematic, 
however, and a lively debate exists over the possible overestimation of extinc- 
tions incurred through the standard use of methods based on the reversing 
of the species-area relationship (SAR), i.e. the number of species observed at 



increasing sampled areas (Green and Ostling, 2003 He and Hubbell 2011) 



Recently Bertuzzo et al. (2011) have proposed empirical and theoretical 



evidence for a new macroecological pattern, namely the distribution of per- 
sistence times for trophically equivalent, co-occurring species. The species 
persistence time (SPT) is defined as the time-span incurred between a species' 
emergence and its local extinction in a given geographic region. Using two 
different long-term ecological databases, related namely to the inventory of 
North American breeding birds and to Kansas grasslands, Bertuzzo et al. 



pOllj ) suggested that the empirical SPT distribution is characterized by a 
power-law function limited by an exponential cut-off determining the maxi- 
mum observed persistence time, which in turn is related to the spatial extent 
of the ecosystem. The advantage to concentrate on SPTs, is that SPT distri- 
butions have proved to be a robust measure of species turnover at different 
spatial scale. Although SPT distributions cannot provide predictions about 
extinction rates of specific species, they can describe the global evolution 
of the diversity of an ecosystem and give robust estimates for mean SPTs 



(Bertuzzo et al. 2011). 



The problem of the scale of observation becomes central in this framework. 
In fact, at the local scale, say an observational site for the presence/absence 
of breeding birds, the persistence time of a species is controlled by ecologi- 
cal processes operating at short timescales, like birth/death, dispersal, and 
contraction/expansion of geographic ranges. At this scale local extinctions 



are dynamically balanced by colonizations (MacArthur and Wilson, 1967 



Tilman 1994 Ricklefs and Scheuerlein 2003; Muneepeerakul et al. , 2008). 



At the global scale, species emergence and extinction are controlled by mech- 



anisms acting on evolutionary timescales (Brown and Kodric-Brown , 1977 



Diamond 1989). Interestingly, the scaling behavior proposed by Bertuzzo 



et al. (2011) to govern the transition from local to global scales is capable 



of effectively describing the overall dynamical evolution of the ecosystem di- 



versity. Scaling features also allow to predict SPT distributions for wide 
geographic areas from measures of persistence on smaller areas, which are 
much easier to monitor. 



To provide a theoretical explanation for the SPT pattern, Bertuzzo et al 



(2011 ) resorted to a spatially explicit model based on neutral theory. Neutral 



theory has been proposed as a unifying theoretical context for understanding 
ecological patterns (Hubbell 2001). Since its formulation, several studies 



have focused their attention on it (e.g. Volkov et al. 



2003 



and Vallade] [20031 [Chavel |2004l [Xlonso and McKanej [20041 |Azaele et~ar 
20061 [Muneepeerakul et all [20081 [Chisholm and Lichstein[ [2009[). Neutral 



Houchmandzadeh 



models are based on the assumption that, within a single trophic level, indi- 
vidual birth and death rates are species-independent. The main advantages 
of neutral models are that they are falsifiable, and that they are able to 
generate predictions for many different macroecological patterns. This is 
the case, for instance, of the relative species abundance (RSA) distribution 
(McGill, 2003 Volkov et al. 2005), the species-independent beta diversity 
patterns (Condit et al. , 2002 Zillio et al. 2005), species geographic range 



(Bertuzzo et al. , 2009) and the species-area relationship (Brown 1995 Zillio 



et al. 2008 O'Dwyer and Green, 2010). Remarkably, this is accomplished 



by invoking only basic ecological processes such as birth, death, migration 
and dispersal limitation. Although certain emergent ecological patterns are 
independent of the fine ecological details and often well predicted by neu- 
tral theory (Bell, 2001), this does not imply that the underlying ecological 
processes are neutral (Harte, 2003 Purves and Pacala, 2005). Although the 



SPT distribution framework does not necessarily require neutral processes as 
well, we shall keep them as a reference modelling frame for our theoretical 
speculations. 

This paper explores the subject of the SPT distribution further by ex- 



tending the empirical and theoretical analysis presented in Bertuzzo et al. 
(2011). In particular, we report here on the study of two new long-term 



datasets to test the robustness of the SPT macroecological pattern. More- 
over, we study in details the connections of the SPT distribution with the 
structure of the SAR and the endemics-area relationship (EAR, which gives 



the number of species that are confined within the sampled area; see [Kinzig 
and Harte, 2000). We also investigate the role of spatial interactions in the 



new SPT data and show that the effect of the environmental matrix (Rick- 



etts 2001) on SPT distributions depends on the overall geographic area in 



which the ecosystem is embedded: if the typical dispersal range is compa- 



rable to the total area, possibly because of dispersal limitation, then spatial 
interactions may be neglected. Finally we present and solve analytically a 



new particular case of a length-biased sampling problem (Cox, 1969) that 



arises in SPT data analysis; namely we find how to relate the SPTs derived 
from finite observational windows to the inherent SPT distribution of the 
ecosystem. 

The paper is organized as follows: in the next section we briefly summa- 
rize the spatially explicit model for SPTs. In subsection 2.1 we discuss the 
effects of data sampling on SPT distributions. We present a general mathe- 
matical framework to relate SPT theoretical distributions to those obtained 
from finite samples. In the third Section, we present novel empirical SPT 
distributions regarding two different ecosystems (a forest in New Jersey and 
an assembly of estuarine fish in the Bristol Channel) and compare them with 
the distributions predicted by the neutral theory. The role of spatial scales 
and their interactions with SPT distributions, and the intimate relation be- 
tween the SPT distribution and the SAR/EAR are also shown. A set of 
conclusions closes the article. 



2. The Model 



Neutral theory (Hubbell, 2001), which assumes that all the individuals 



within a given trophic level are competitively equivalent, offers a benchmark 
dynamics suggesting that many aspects of real biotic patterns may not re- 



quire a more complex model (Muneepeerakul et al. , 2008 Bertuzzo et al. 



2009). According to the assumption of neutrality (Hubbell, 2001), the dy- 



namics of a species in the ecosystem is fully specified by its effective birth 
and death rates b{n) and d{n), which depend exclusively on the population 
size n. Species abundance dynamics is described by the so called birth-death 



2003): 



master equation (ME) (Volkov et al. , 2003 Houchmandzadeh and Vallade 



dP{n,t) 
Jt 



b{n-l)P{n-l,t) + d{n+l)P{n + l,t)-{b{n) + d{n))P{n,t), (1) 



where P{n, t) is the probability, for a given species, of having a population of 
n individuals at time t. We note that h{n) and d{n) take into account several 
ecological processes that may increase or decrease the number of individuals 
in a species over time as, for instance, immigration or emigration. Assuming 
absorbing boundary conditions in n = 0, the probability density function 



(pdf) of the theoretical persistence time Pr can be expressed as (Pigolotti 
er"aLJ[2005| : 

Considering b{n) = n{l — v) and d{n) oc n, the above model describes ana- 
lytically the mean field dynamics (in the limit of infinite number of nodes) of 



the following lattice model, known as the voter model (HoUey, 1975 Durrett 



and Levin 1996 Zillio et al. 2005). v is termed diversification rate and it 



measures the frequency of appearance of new species in the system. Let us 
consider, for simplicity, a system with A^ nodes {N — )■ oo). Each node may 
be occupied by a single individual belonging to a species which is represented 
by a given color. The dynamics at each time step is as follows. A randomly 
selected individual dies. With probability v this site is occupied by a new 
species; with probability 1 — ly, the site is colonized by an offspring of a node 
randomly chosen in the lattice (mean field case). We note that in this frame- 
work we require that the probability that a new species enters the system be 
uniformly distributed in time. This seems like a reasonable assumption in the 
absence of strong environmental or historical gradients within the considered 
timescale. 

The spatial version of this model allows colonizations to occur only from 
one of the neighbors of the empty patch (i.e. dispersal limitation). The same 
model can be applied to 1D/3D lattices, or to a networked environment where 
dispersal is constrained directionally. By tracking every individual up to a 
statistically stationary state, it is possible to build the time series of species 
abundances and, from it, to compute numerically the SPT distribution. SPTs 
prove crucially dependent on the type of spatial connectivity interactions in 
the voter model. Exact and numerical results for the spatial voter model 



(Bertuzzo et al. , 2011), in fact, show that the SPT distribution is consistent 



with power-law shapes of the type: 



Pr{t) = Ct-^e 



-ut 



(3) 



where the scaling exponent a strictly depends on the connectivity structure 
and C = z/^~"/r(l — a) is the normalization constant {T{x) is the complete 
Gamma function of argument x ). The neutral model thus predicts the same 
functional shape for the STP distribution, a power law with the exponent a 
that is solely determined on the connectivity structure of the environment 



(Figure 1). Therefore a may be seen as a critical exponent (Stanley, 2000) 



that is, it describes the behavior of physical systems near phase transitions. 
In general it is argued that scaling exponents do not depend on the details of 
the physical system, but only on the dimension D of the system (e.g. D = 1 
on a line, D = 2 on an isotropic lattice and so on) or the range of the inter- 
actions (Stanley, 2000). This is not the first time that similarities between 



ecosystem dynamics and critical systems in physics have been pointed out 



(Banavar et al. 1999 Zillio et al., 2008). 



From Monte Carlo simulations of the spatial voter model, a exponents 



et al. 



for different geometric cases can be evaluated. It has been shown (Bertuzzo 
2011) that a ~ 1.5 for the ID lattice, a ~ 1.62 for the networked 
a ~ 1.82 and a ~ 1.92 respectively for the 2D and 3D lattice 
In the mean field case, the scaling limit {v ^ 1) of Pr(^) can 



landscape, 
(Figure 1) 



be calculated exactly (Pigolotti et al. 



2005) yielding p^if) 



-fiut), with 



f{z) = {z/{l — e~^)y^e~^. The robustness of the scaling behavior as a function 
only of the structure of the environmental matrix of interactions is deemed 
remarkable. 



2.1. An Ecological Length-Biased Sampling Problem 

Empirical data on persistence times for different species are available, es- 



pecially from succession studies and long-term ecological databases (Pigolotti 



et al. 2005 Bertuzzo et al. 2011). However, for further empirical analysis. 



sampling effects on SPTs need to be taken into account. This type of sam- 
pling problem belongs to the class of so called length-biased problems, that 



arise in a wide range of different disciplines (Patil and Rao, 1978 Vardi 



1982). 



We start with a thought experiment which we refer to as the stick-length 
sample problem. Consider an infinite number of sticks of different random 
length L distributed according to some unbiased pdf, say pi- Imagine to 
place randomly these sticks on a plane x-y, aligned in the same direction 
(e.g. along the x direction). The process is thus defined in a ID space, 
and the y axis is just a way to visualize different realizations of the same 
process. We denote by X^ the position of the left side of a given stick. X° 
is a random variable uniformly distributed in (0,oo). Consider only those 
sticks that cross a vertical line x = x* and measure the lengths of those sticks 
(Figure 2). 

We will denote by pi^ the length distribution of this sticks sample. The 
stick-length sample problem can be formulated with the following questions: 
a) how is pl^ related to pl? b) How are the left and right ends of the sticks 
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that cross x* (indicated by Lp, Figure 2) distributed? We now show how 
length measurements on particular samples of the stick population give length 
distributions which are different from that of the whole stick population. 

Using tools of probability theory, it can be shown (Appendix A) that the 
pdf of the length of the sticks that cross x*, in the limit x* — ?■ oo (to avoid 
boundary effects), is 

PL An = -jjy, (4) 

where (L) = j^ LpL(L)dL is the average stick length. Thus the length pdf 
of the sample constituted by those sticks that cross the observation point x*, 
Pl^, is not the same as the overall population stick length pdf, i.e. p^. 

Let us now introduce a second vertical line x = x < x*. We now want to 
measure the length of the portions of sticks Lp crossed by the two vertical 
lines { X = X and x = x*) that are comprised inside the space window 
Ax = X* — X (see Figure 2). If we denote by pl^ the pdf of Lp, then it can 
be shown (Appendix A) that: 

j^PL{L)dL 
PL,{1) = T^^ • (5) 

Finally, we can calculate the pdf p^/ of the random variable L' describing 
the length of a stick that is totally or partially comprised within the sampling 
window Ax. We note that in this case the maximum measured length of a 
stick is max{L'} = Ax. V can be expressed as a function of the other inde- 
pendent random variables L and X°, that have already been probabilistically 
characterized, in fact, we can distinguish four different cases (see Figure 3): 

• the stick is completely inside the window, then L' = L; 

• X^ > X and the stick crosses x = x*, then L' = x* — X° = Lp 

• X° < X < L + X^ < X* and the stick crosses x = x, then L' = 
X^ + L-x = Lp- 

• the stick is longer that the space window, then L' = Ax. 
As explained in detail in Appendix A, we find that: 

PL'il) = ■^((Ax-/K(/)0(Ax-/) + 



/•oo 

+ Q{Ax-l) / pLiL)dL + 

Jl>0 
poo 

+ Q{Ax-l) / PLiL)dL + 

Jl>0 

+ 5{l-Ax) f {l-Ax)pL{L)dL), (6) 

J Ax ^ 

where M is the normahzation constant for p^.'? and ^ix) and 0(x) are the 
Dirac delta distribution and the Heaviside step function, respectively. The 
last term of equation (pi) gives an atom probability in L = Ax corresponding 
to the fraction of sticks longer than the total length of the observational 
window (max{L'} = Ax). 

The probability distribution pi^^ of the sticks' length inside the window 
Ax follows directly from the first term of equation ([6]): 

Pi,(/) = i^(Ax-/K(L)e(Ax-/), (7) 

where M' is a proper normalization constant. 

The analytical results provided by Eqs. ([6]) and ([T]) are tested via numer- 
ical simulation, as shown in the bottom panel of Figure 2. First we generate 
10^ segments along the x — axis of random length drawn from an exponential 
pdf, i.e. Pl(0 = Ae^'^'. Afterwards, we consider only those segments that are 
within the spatial window Ax = 1/A and we reconstruct the stick length pdf 
Plj . Analogously, we build pi^i . Finally we compare the numerical distribu- 
tions with the the solution of Eqs. ^ and ([T]). As expected the agreement 
is perfect. 

2.1.1. Application to SPTs. 

Several real ecological sampling problems, both in time and space, can be 
mapped into the thought experiment described in the previous Section. In 
particular we now apply these results to SPTs. 

Assume that species abundance (or presence-absence) time series for a 
given ecosystem characterized by a single trophic level are available from 
field campaigns carried on in a time period of span At^ = tf — to years. 
From the collected data, SPTs can be measured and the empirical SPT dis- 
tribution likewise obtained. This distribution does not correspond to the 
theoretical SPT distribution, pr, in the same way as pi y^ Pu in the stick 



sample problem. In particular, Pr(t) is affected by the fact that species per- 
sistence can be recorded only within a finite temporal window At^. Thus, 
given the theoretical persistence-time pdf described by Eq. dsl), we are in- 
terested in the pdfs of the variables that we can actually measure, i.e. the 
persistence times r' of those species emerging before or during the observa- 
tions and that are still present or go locally extinct within the time window 
Atw (and the persistence time tj of only those species that emerge and go 
locally extinct within the observed time window). 

In our theoretical framework, the probability of a diversification event in 
a time step is constant. Thus the emergences of new species in the system 
obey a Poisson process, say U{t), with rate X = uN, where A^ is the total 
number of individuals in the system and A has the dimensions of the inverse 
of (generation) time. Consequently the emergence times T^ of species in 
the system are uniformly distributed. Therefore, we can map exactly this 
situation into the previous stick-length sample problem, with SPTs replacing 
stick lengths, i.e. L ^ r, L' ^ r', Lj -^ tj and X° -^ T°. Eqs. iQ and Q 
allow us to test the theoretical SPT pdi pr(t) obtained by the spatial model 
against the empirical SPT distribution obtained from the measurements. 

It is interesting to note how our thought experiment uses similar ap- 
proaches to the model known as the mid-domain effect, which describes how 
species ranges that are randomly shuffled within a bounded geographical do- 
main overlap increasingly toward the center of the domain, creating a mid- 



domain peak of species richness (Colwell et al. , 2004 Arita, 2004). In fact. 



SPTs are essentially distributions of species temporal ranges, and therefore, 
the sampling problem just presented may be interpreted as a sort of mid- 
domain problem in time. However there is a crucial difference between the 
two effects: while for spatial ranges there are physical geometric constraints 
imposed by hard boundaries in space (i.e. geographic boundaries), and thus 
species spatial ranges must be contained within the given geographic region 



(Colwell et al. , 2004 Arita, 2004), temporal ranges do not have constraints 



in time. It is in fact to be reminded that species temporal ranges are not 
limited by the observational time window, rather is our capacity to measure 
real species temporal ranges that is limited. 



3. Results. 

3.1. Empirical SPTs and Comparison with Model Results. 



Bertuzzo et al. (2011) presented a comparison between the theoretical 



model and the empirical SPT distributions for two different ecosystems (North 
American breeding birds and Kansas grassland) which is shown in Figure 4a,b 
for purpose of completeness. 

We now provide new evidence of the existence and robustness of the SPT 
distribution pattern by presenting empirical data on persistence times taken 
from two different ecological databases and comparing them with the ana- 
lytical results on SPT distributions. Specifically, we empirically characterize 
SPT distributions of two long-term datasets concerning ecosystems with very 
different characteristics: 1) a 44-year long study from the Buell- Small Suc- 
cession (BSS) Study of plants in New Jersey ; and 2) a 28- year long database 
of British estuarine fish collected at Hinkley Point (headland on the Bristol 
Channel coast of Somerset, England). 



The BSS study (Institute of Ecosystem Studies, 2008) includes ten fields 



that were released from agriculture and used to study succession dynamics. 
Permanent plots, measuring 48 m^ each, were established in every field. The 
permanent plots were sampled every year from 1958 to 1979, after which they 
were sampled alternatively every other year to the present day. The fields dif- 
fer in the year of release, thus we consider only data collected after the latest 
field abandonment year (1968). In order to avoid missing data in alternate 
years, the minimum sampled area (afterwards named cell) considered in the 
calculation of the empirical SPT distributions comprises two adjacent fields 
(96 m^). In this way each cell is populated with presence-absence records for 
each year. We repeat the same analysis for 3 other different scales: 4,6,8 ad- 
jacent fields (A=192, 288, 384 m^, respectively). We also analyze the whole 
system, which corresponds to an area oi A = A* = 480 m^. For every scale 
A of analysis, the presented results refer to the average over all possible com- 
binations of adjacent plots within the system (moving average procedure). 
A three-dimensional presence-absence matrix P is in this way built. Each 
element Pgtc of the matrix is equal to 1 if species s is observed during year t 
in a cell c, otherwise Pgtc = 0. The empirical SPT is defined as the number 
of consecutive years in which the measurements reveal the presence of the 
species in that geographic region (see Figure 3). For every cell c and every 
species s we measure persistence times from presence-absence time series de- 
rived from the second dimension/index of matrix P. The presence-absence 
time series thus form a vector of length T, where T is the total number of 
years of observation, that has as j-th component a one if species s is observed 
in cell c during the j-th year. Persistence time is defined as the length of 
a contiguous sequence of ones in the time series (Figure 3). For every scale 
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of analysis we consider all the measured persistence times regardless of the 
species they belong to, and we build the SPT empirical pdf at every cell scale 
(Figure 4c). 

As for the second database, fish samples were collected from the cooling- 
water filter screens at Hinkley Point B Power Station, situated on the south- 
ern bank of the Bristol Channel in Somerset (England) from 1980 to 2008. 
A full description of the intake configuration and sampling methodology is 
given in Magurran and Henderson (|2003). We only consider estuarine fish 



species (not crustaceous organisms). Empirical SPT pdf's can be computed 
as described for the BSS dataset (Figure 4d). Note, however, that in this 
dataset there is no spatial scale variability, because the samples were col- 
lected in a single point in space. Therefore we limit our analysis to a single 
spatial scale. 

Using Eqs. ^ and ^ and the observational data we performed the best 
fit of a and z/, the parameters of the theoretical SPT pdf. At any spatial 
scale, the scaling exponent and the diversification rate for the empirical SPT 
distributions are determined with a simultaneous nonlinear fit of observa- 
tional and analytical pdfs of r' and tj. Table 1 summarizes the results of the 
fit. Remarkably, in both cases the scaling exponents derived empirically are 
consistent with those predicted by the neutral voter model. In particular, 
for the fishes we find a = 1.97 ±0.06, which is compatible with the exponent 
a = 2 predicted by the mean field voter model (see Figure 4d). In fact, no 
spatial information is embedded in our empirical data and thus we do not 
expect to see the effect of dispersal limitation on the relevant ecosystem dy- 
namics. Figure 4c shows the case of the New Jersey plant communities, for 
which the best fit of Pr is a = 1.97 ± 0.12 (where a is the average of the ex- 
ponents over all the cell scales), which is also compatible with the mean field 
prediction a = 2. In this case the negligible effects of dispersal limitation 
on the ecosystem dynamic suggests that the average plant dispersal radius 
R is comparable with the linear dimension of the total sampled area, i.e. 
R ^ VA*, and thus the net result is a mean field dynamics not affected by 



spatial interactions (Bertuzzo et al. , 2011). The fact that the total sampled 



area of the ecosystem is very small [A* = 480 m^) strongly supports this 
interpretation of the data. 

While studying SPTs in a given ecosystem on the basis of presence/absence 



(or count) data, imperfect detection of species is a source of concern (Nichols 



et al. , 1998). Species, in particular animal species, are, in fact, routinely sam- 



pled with a detection probability much smaller than one. Although in general 
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the problem of imperfect detection is somewhat less relevant for herbaceous 
plant datasets, in the case of the BSS forest sampling occurs either annually 
or biannually, e.g. implying that seasonal variability in plant composition 
may be not picked up (and this varies from year to year). However, plants 
that go undetected because of yearly sampling affect the estimates of per- 
sistence times shorter than one year, thus not an issue for the scaling of 
SPTs. Moreover, plants in vegetative state are not identifiable, nor are small 
recruits, typically. For the fish database the power station intakes at Hink- 
ley Point are an effective sampler because of their location at the edge of a 
large inter-tidal mudfiat in an estuary. The filter screens have a solid square 
mesh of 10 mm and start to retain fish of standard length of about 25 mm 



(Henderson and Holmes, 2001). A 99% retention for many species occurs at 
the standard length of 40 mm. Nevertheless, incomplete sampling is present 
because only a small fraction of the water in the Bristol channel is filtered, 
and filtering thus only catches fish that cannot avoid being sucked by the wa- 
ter turbines. Hence, even in intensive sampling programs is difficult to tell 
whether absences are real or caused by incomplete detection/sampling. Note 
that the impact of imperfect sampling on SPTs of North America breeding 



birds has been analyzed in details in (Bertuzzo et al. , 2011). However, we 
acknowledge that to show that SPTs are generally a more robust alternative 
to estimating extinction rates, a systematic and widespread analysis of the 
effect of incomplete sampling and detection on SPTs needs to be undertaken. 
This is still lacking and will be tackled in details in future works. 

3.2. Relations among SPT, SAR and EAR 

The SPT distribution is also particularly interesting because it generates 
spatial patterns that can be related, within our theoretical framework, to 
other important macroecological patterns, as the SAR and the EAR rela- 
tionships. We can use the BSS data at different spatial scales to investigate 
how SPTs can give information also on the biogeography of the BSS plant 
ecosystem. In fact, the scale-invariant character of Pr indicates that a de- 
pends only on the spatial connectivity of the environment (i.e. networked, 
2D, ..), and not on the sampling area. On the contrary u, which accounts for 
immigration processes from species outside the local community, is argued to 
decrease with increasing sampling area, thus representing a clear signature of 
the geographic scale of the analysis. The SAR characterizes how the average 
number of observed species increases with increasing sample area and it is 



usually characterized by a power-law behavior, i.e. S ^ A^ (Brown, 1995). 
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Our theoretical framework allows for the linkage of the cut-off timescale Xjv 
with the spatial scale of analysis, and the SPT distribution to the SAR. In 
fact, in the neutral model, species emerge as a point Poisson process with 
rate z/iV and last for a lifetime r . Therefore, the mean number of species 
in the system at stationarity is (5) = vN{t) (Bertuzzo et al. , 2011). By 



deriving the scaling behavior of i^, A^ and (r) with respect to A, we can thus 
make a connection between SPTs and the SAR. 

The last term in Eq. (|6| represents the analytical expression of the frac- 
tion of species that are present during the whole observational time period 
(iS(At^)). It provides a tool to quantitatively estimate the diversification 
rate from the empirical SPT pdf. In fact, in order to infer with better preci- 
sion V from the observational data, we numerically find the value of v that 
minimizes iS(At^) at every different spatial scales A (see Appendix C for 
details) using the scaling exponent a. The best fit of the scaling law between 
V and A obtained in this way gives v ~ A~^ , with /3 = 1 ± 0.1 (Figure 5a), 
thus confirming that the diversification time v~^ scales almost linearly with 



the area (Zillio et al. 2005). 



Once the scaling law between v and A is established, in order to derive the 
SAR, we need to estimate the scaling between the total number of individuals 



A^ and the sampled area A. In Bertuzzo et al. (2011) an isometric relation 



N (X A (MacArthur and Wilson, 1967) was assumed. However the abundance 



data on the New Jersey plant communities show that this is not the case for 
that ecosystem. In fact the average total species coverage percentage over all 
the plots is 165%. Samplers start with the topmost layer, usually the trees. 



and work their way down to the species closest to the ground (Institute of 



Ecosystem Studies, 2008). Thus the exceedance in the coverage percentage 
is due to the overhanging among species. In order to evaluate the scaling 
relation between A^ and A, we then build a minimalist model, in which species 
are classified by their typical average size. Assuming biotic saturation, i.e. 
that each species covers completely the sampling region, in a plot of area 
A we can find at most [A/a\ individuals of characteristic size area a ^ A 
([x] denotes the smallest integer not greater that x), [A/2a\ individuals of 
characteristic area 2a and so forth. In general, the total number of individuals 
A^ in a region of area A scales as 



[A/a] 



N{A) (X ^ 



A A 



H 



n=l 



na 



A/ai 



(8) 
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where Hq = ^2^1=1 -'-/'"' i^ ^^^ harmonic number and H^/a ~ \n{A/a). The 



constant of proportionahty in Eq. rtSj) depends on the arbitrary choice of a. 
However since HA/a ~ ln(y4/a)+ sub-leading terms, a change of a in Xa leads 
to Hxa = I -ffA/a+ sub-leading terms and so its precise value is irrelevant 
since it can be absorbed in the proportionality constant (see below) to be 
determined from the best fit of the data. 

Finally, we calculate the asymptotic behavior of the mean SPT (r) = 
J tpr{t)dt (see Appendix C for mathematical details) and we get: 



(r) 



In(yl) 



for a < 2; 
for a = 2. 



(9) 



Putting all the above scaling results together, we find that for the BSS 
plant ecosystem (which is characterized by a ~ 2) the mean-field neutral 
framework predicts a power-law SAR with logarithmic correction: 



{S{A))=KA'-''\n{A)HA, 



(10) 



where K is the constant of proportionality. We determine the constant K 
in three different ways: 1) through the best fit of Eq. (10) on the empirical 



SAR data; 2) by imposing that in the total area A* we have the total number 
of species 5**; 3) by finding the value of K that gives the exact number of 
species in the smallest area {A = 96 m^) and then predicting the complete 
SAR curve (upscaling). Remarkably, all three methods yield K ^ 3.46. We 
also performed the best fit of the power-law SAR (5* ~ A^) on the empirical 
data, obtaining z = 0.34. We note that in our range of areas, this result it is 



not distinguishable from the SAR given by Eq. (10) as shown by Figure 5c 



In the special case of mean field interactions, we can also calculate the 
EAR. EAR is defined as the relation between the number of species endemic 



to a region and the area of that region (Kinzig and Harte 2000 Green and 



Ostling 2003). The EAR has been recently proposed by He and Hubbell 



(2011) as the correct approach in order to estimate extinction rates from 



habitat loss. The fact that the mean-field approach well describes the empir- 
ical SPT distribution suggests that in the BSS ecosystem the phenomenon 
of spatial aggregation or clustering is negligible. In fact mean-field dynamics 
indicates that species composition is well mixed in the ecosystem and thus it 
does not depend on spatial features. Therefore, in this case, we can evaluate 
the EAR curve just by reversing the SAR ( He and Hubbellf 2011): 

{E{A)) = S*- {SiA* - A)) = S*- \n{A* - A) Ha-.a, (H) 
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where E is the number of endemic species. Figure 5 summarizes all the above 
results and shows the comparison between the empirical and theoretical re- 
sults for the SAR and the EAR. 

4. Conclusions 

We have presented new theoretical and empirical evidence supporting 
the broad validity of a recently proposed macroecological pattern, the SPTs 
distribution. Specifically, we have completed (and described in detail) the 
theoretical treatment of the relevant ecosystem dynamics and of the de- 
rived sampling problem, and analyzed empirically two further rather diverse 
ecosystems (hosting respectively herbaceous plants and estuarine fishes). In 
both cases the observed SPT distributions display a power-law shape with a 
cut-off due to the finiteness of the observational time window, which is re- 
produced by the theoretical model. These results are expected to be robust 
with respect to relaxations of specific assumptions on ecological neutrality 

thus confirming our expectation that power laws. 



(Bertuzzo et al., 2011) 



observed for SPT pdfs, are the result of emergent behaviors independent of 
fine details of the system dynamics. 

The specific values of the scaling exponents of the SPT distributions de- 
pend only on a few key factors, namely the spatial dimension of the embed- 
ding ecosystem matrix and the nature of dispersal limitation. In particular, 
our results show that dispersal limitation does not occur in the ecological 
dynamics of both ecosystems under study. While this result could be ex- 
pected for the estuarine fishr inventory, where data were collected at a single 
point in space, quite surprisingly the BSS plant community does not show 
signs of dispersal-limited spatial interactions. We interpret this result as a 
consequence of the fact that the total sampled area in the BSS forest is very 
small. This suggests that the average plant dispersal radius is comparable 
to the characteristic size of the ecosystem, thus complying to mean-field-like 
dynamics. 

Because of the assessed, robust scale invariance character of the SPT dis- 
tribution (limited only by biogeographical finite-size effects), and because of 
its relation with other macro-ecological patterns, we have proposed that the 
SPTs distribution is a powerful synthetic descriptor of ecosystem biodiversity 
and of its associated dynamics. 

New data would allow for the investigation of other implications of the 
theoretical predictions. For instance, the neutral voter model predicts that 

15 



the scaling exponents of the SPT distributions of riparian ecosystems (i.e. 
networked environments where directional, anisotropic dispersal is forced by 
the structure of the fluvial environmental matrix) should be lower than those 
of 2D, 3D or mean-field ones. This result, if proven, would have remarkable 
consequences for conservation ecology, because it suggests that species that 
disperse isotropically have shorter average persistence times than species that 
are constrained to disperse along spatially constrained and anisotropic eco- 
logical corridors, like those provided by river networks. This, in turn, calls 
for long-term analysis of riparian ecosystems to test empirically the effects 
of river morphology on ecosystem dynamics. 

For these reasons we suggest that field biologists and ecologists should 
perhaps invest renewed efforts in collecting improved long-term datasets of 
ecosystem dynamics at different spatial and temporal scales and under dif- 
ferent dispersal conditions (for instance, archives beyond presence/absence 
of species within networked environments like riverine ecosystems for fresh- 
water fish) as they prove crucial for our deeper understanding of universal 
patterns in macroecology. The present study shows that long-term datasets 
can be profitably used to highlight crucial properties and spatial effects of 
ecosystem dynamics, including the role played by the underlying connectivity 
structure in shaping SPT distributions. 
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Appendix A. 

In this appendix we derive the analytical results of the stick-length sample 
problem presented in the main text. The set up of the thought experiment 
is fully described in the main text and in Figure 2. 

Specifically, the stick-length sample problem can be characterized by five 
random variables, namely: i) X° is the position of the left side of a given 
stick; ii) L describes the length of a stick; iii) L^. is the length of those sticks 
that cross x*; iv) Lp is the length of that part of the stick that crosses x* or 
X and fall within Aa;; v) V is the length of a stick that is totally or 
partially comprised inside Ax. 

X° and L are the two independent random variables of the problem, while 
Lc, Lp and L' can all be expressed in terms of X^ and L, as we will show in 
details. Assuming that pl is the unbiased length stick distribution, we now 
present a methodology that enable us to derive the length pdf pi^ ^'^^ the 
pdf Pip as a function oipi- The same methodology may be applied to 
obtain pii 

The random variable Lp can be expressed as a function of the random 
variables L and X". In fact, we have that the sticks that cross x* are those 
for which X^ < x* and X" + L > x* (see Figure 2). Therefore, if G is the 
Heaviside step function, the length of a stick that crosses x* is given by the 
random variable 

Lc = LQ{X^ + L-x*)Q{x* -X^). (A.l) 

We know that L is distributed (~) as pi and that X^ is uniformly 
distributed along the x-axis (in particular X° ~ ?7(o,a;*))- Therefore we have 
that the probability density function (pdf) pi^ is 

^ {6(1 - L)Q{X^ + L - x*)Q{x* - X^)) 
^"-^^^ (e(XO + L-x*)0(x*-XO)) ' ^ ' 



where 5 is the Dirac delta function. The numerator of Eq. (A. 2) simply 
gives the ensemble average, taken over the random variables A° and L, of 
different realizations of the event that occurs when a stick of length L 



intersects x*, while the denominator in (A. 2) assures the normalization 

condition. 

In order to solve the ensemble average of the latter equation, we split the 

calculation into two steps. First we calculate the conditional probability 
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function pl^{1\L) of a stick crossing the vertical line x*, given the fact that 
it has a length L, i.e. we perform only the average over X° 



As a second step, we marginalize Eq. ( A.3[ ) over L, i.e. we perform the 
average also over L: 

f;''p,(L)Hl-L)mm{^',L]dL 

Assuming that we do not have boundary effects (i.e. x* -^ +00) we obtain 
Eq. Q in the main text. 

The case of Lp can again be obtained using the same technique. Writing 
the random variable Lp as a function of L and X^ (see figure 2) we get 

Lp = {x* - x°)e(x* - x°)e(x° + L- x*)e(x°). (a.5) 

Following the same steps presented before, we obtain 

/- ^(/ - {x* - x°))e(x- - x")e(x° + L - x*)9(x°) 
^^^^^"^) = (e(x*-xo)e(xo + L,-x*)) ' (^-'^ 

and marginalizing over L we obtain 

p^^(;) = e(^* _ /)_£^(^i:iiM^, (A.7) 

Jq °^ m.m[x*, L]pL{L)dL 

that, in the limit x* — )■ 00, results in Eq. ([s]). The case of a stick crossing 
a; = £ is perfectly symmetric to the one discussed above and would lead to 
the same result. 
The results for L' and Lj follow similarly. 

Appendix B. 

In this section we present the calculations used to estimate the 
diversification rate z/ from the empirical SPT distributions of the BSS plant 
community. We denote by £^(At^) the fraction of species, evaluated from 
the data, that are present during the whole observational time period At^„. 



The last term in Eq. ^, denoted here as Aai^, gives the theoretical value 
of the latter quantity, i.e. the atom probability of the SPT distribution for 
t = Atw By using the theoretical SPT distribution described by Eq. ^ we 
can estimate ^(At^;): 



Z/1-" 



JAt^ i (1 - a, lytmin) 

r(2 — a, u/S.tyj) — u/S.twVil — a, v/S.tw) 



(B.l) 



where r[a, b] is the incomplete Gamma function, and tmin is the minimum 
SPT observed in the ecosystem (in general t^m — ;■ 0). If a < 2, then the 



dependence of Eq. (B.l) on utmin is weak, and thus, as z/t^m ^ 1? we can 

set l^tmin ^^ U. 

Therefore we calculate the values of the diversification rate u for a given 
Atu, (that in turn depends on the specific sample area A) by numerically 
minimizing the quantity 

S{u\At^) = (AAtM - £iAt^)f, (B.2) 

where the average scaling exponent a obtained by the best fit of the SPT 
pdf Pr' and p^j has been used for a . Eventually, repeating the same 
procedure using different time windows (corresponding to the different 
sampled areas) we obtain the scaling relation between u and A presented in 
the main text. 

Appendix C. 

In this appendix we derive the asymptotic behavior of the mean persistence 
times (r). We divide the calculation into two cases: a < 2, where the 
dispersal limitation is an important driver of the ecosystem dynamics and 
a = 2, the mean field case. The mean persistence time for a < 2 is 

[°? t^-'^e-^dt T(2-a ut ) 



Integrating by parts, the denominator of Eq. (CI) becomes: 



(ut ■ V^"' 1 

r(l - a, utrmn) = ^ '"'"^^ + . r(2 - a, z/t™„), (C.2) 

a — I I — a 
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and using the limit lim^i^^^^o r(2 - a, utmin) =T(2- a) > 0, Eq. ( |C.1[ ) 
simplifies to 

(r) ~ z/"-2. (C.3) 

In the case a = 2 we have instead 

where En{z) = J^ e~^^t~"'dt is the exponential integral function. The 



expansion of the exponential integral function for a; ~ is (Abramowitz 



and StegunI [1965[ pag. 229): 

Ei{x) ~ -— + x-log(x) -7 (C.5) 

2 

E^ix) ~ -y + x(log(x)+7-l) + l, (C.6) 

where 7 is the Euler constant. Substituting the latter expansion into Eq. 



(C.4) we obtain 

{r) ~ -t™„(log(z/) + 7) ~ - log(z/). (C.7) 

Therefore we find that (r) scales as the logarithm of z/ and, in turn, as the 
logarithm of the sampled area A (in fact we found that u ~ A~^). The 
logarithmic correction in the scaling behavior of (r) is supported by the 
data (see Figure 5 in the main text) confirming the hypothesis that the 
evolution of the BSS ecosystem is well mimicked by mean field dynamics. 
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Figure C.l: (a) Persistence-time exceedance probabilities Prit) (probability that species' 
persistence times t he >t) for the neutral individual-based model with nearest-neighbor 
dispersal implemented on the different topologies (redrawn from Bertuzzo et al. (2011)). 
Species persistence time distributions are crucially dependent on the type of spatial con- 
nectivity interactions in the voter model. 
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Figure C.2: Top panel: schematic representation of the stick-length sample problem. The 
variables that can be measured from the sample of sticks in Ax are L' and L/, both with 
a different distribution with respect to p{L). Bottom panel: comparison between analyt- 
ical SPT pdfs PL' and plj given by Eqs (p|-([7|, and those obtained through numerical 
simulation of the stick length problem starting from an unbiased stick length distribution 
Pl{1) = Ae~^', with Aa; — 1/A (in this particular example A = 0.025). pl, PLj and p^i 
have been shifted in the semi-log y plot for clarity. 
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Figure C.3: The species persistence-time (SPT) within a geographic region is defined 
as the time incurred between a species' emergence and its local extinction. Recurrent 
colonizations of a species define different SPTs times. The SPTs Tp, tj and r', that can 
be measured within the observational window, are in general different from the unbiased 
SPT T. The four different cases are discussed in the main text. 
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Figure C.4: Comparison between empirical distributions for (a) North American Breeding 
birds, (b) Kansas grasslands, (c) New Jersey BSS forest, (d) estuarine fish community 
and the corresponding theoretical SPT pdfs p{t) of r' (green), t" (blue). Filled circles 
and solid lines show observational distributions and fits, respectively. The spatial scale of 
analysis is A = 10, 000 km^ and Ai^ = 41 years for (a), A=l m^ and Ai^, = 38 years for 
(b), A=480 ve? and Ai^, = 44 years for (c) and Ai^u — 28 years for (d) (for details on the 



analysis of the databases relevant to (a) and (b) see supplementary material in Bertuzzo 
et al.[ 2011 1. The finiteness of the time window imposes a cut-off to Pr'it) and an atom 

to pT»{t), which corresponds to the fraction of species that are 
Pn (t) and pr' (t) have been shifted in the 



of probability in t ~ At^ 

always present during the observational time. 

log-log plot for clarity. 
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Figure C.5: Comparison between a) diversification rates for different areas (black dots) 
and the scaling law v ^ A~^ (dashed blue line); b) mean persistence time for each sample 
area (r(i^(A),Q;))(black dots) calculated using Eq. (Is]), and logarithmic relation r ^ ln(A) 
(dashed red line) predicted by the neutral model for the mean field case (a = 2); c) 
empirical SAR and EAR relationships (black points), power law fitting of the SAR and 
the corresponding EAR (gray continuous line) and the mean field SAR and EAR given by 
Eqs. ([To]) and ([ll]) (green dashed). 
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ecosystem 


A [m^ 


a 


V 


BSS plants 


98 


1.97 


0.0057 


BSS plants 


196 


1.97 


0.0024 


BSS plants 


294 


1.96 


0.0013 


BSS plants 


382 


1.96 


0.0011 


BSS plants 


480 


1.96 


0.0010 


Hinkley fish 


/ 


1.97 ±0.06 


0.00049 ± 0.00007 



Table C.l: Exponents for the Hinkley Point estuarine fish database and for the BSS 
dataset at every spatial scale of analysis. The values of the exponents are obtained by the 
simultaneous best fitting of Eqs. Q and (JTl) on the empirical SPT distributions. They 
suggest that both ecosystems are driven by a mean field dynamics. The value of a given 
in the text corresponds to a = (X]i=i «i)/5 = 1-97 ± Oa-, where o\ = X]i=i "'oi/S = 0.015 
(cr^. is the variance for the i—th a exponent from the fit). Values of f for the BSS plants 
are obtained by the minimization procedure explained in the text and in Appendix B. 
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